-
Notifications
You must be signed in to change notification settings - Fork 732
feat: data pipeline changes for segmentRepositories [IN-552] [IN-554] #3326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: data pipeline changes for segmentRepositories [IN-552] [IN-554] #3326
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventional Commits FTW!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the infrastructure to propagate archived and excluded repository data from the segmentRepositories
table through the TinyBird pipeline to support enhanced project filtering and search capabilities.
- Adds database migration to enable replication of the
segmentRepositories
table to TinyBird - Creates a new
segmentRepositories
data source in TinyBird and updates pipeline to aggregate archived/excluded repository lists - Propagates archived repositories data through the insights project pipeline to enable repository-aware search functionality
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
backend/src/database/migrations/V1753798345__segmentRepositories_replication.sql | Enables replication of segmentRepositories table to TinyBird |
services/libs/tinybird/datasources/segmentRepositories.datasource | Defines TinyBird data source schema for segmentRepositories table |
services/libs/tinybird/datasources/insights_projects_populated_ds.datasource | Adds archived/excluded repositories fields to project data source |
services/libs/tinybird/pipes/insights_projects_populated_copy.pipe | Aggregates archived/excluded repositories and joins to project data |
services/libs/tinybird/pipes/insightsProjects_filtered.pipe | Passes through archived/excluded repositories to filtered results |
services/libs/tinybird/pipes/search_collections_projects_repos.pipe | Exposes archived repositories in search results |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
services/libs/tinybird/pipes/insights_projects_populated_copy.pipe
Outdated
Show resolved
Hide resolved
backend/src/database/migrations/V1753798345__segmentRepositories_replication.sql
Show resolved
Hide resolved
services/libs/tinybird/datasources/insights_projects_populated_ds.datasource
Outdated
Show resolved
Hide resolved
services/libs/tinybird/datasources/segmentRepositories.datasource
Outdated
Show resolved
Hide resolved
services/libs/tinybird/datasources/segmentRepositories.datasource
Outdated
Show resolved
Hide resolved
Ping @epipav |
eea8f5c
to
4501d3e
Compare
@borfast can you confirm if the PR is up to date with the latest comments? And if so, can you gather all the questions you need to align with the team and perhaps go over them on a quick sync |
4501d3e
to
3a3d9af
Compare
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
…olumns Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
e78438a
to
f7f5eb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only issue is the ENGINE_VER
of the new datasource, other than that looks good 👍
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
83c54c6
to
1d7cd18
Compare
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
In the context of IN-524, we want to have the archived and excluded repositories for each project returned along with the project data.
This means adding the repository data to TinyBird, which already includes the archived and excluded flags in the
segmentRepositories
table, and then merge that data into the existing pipes that return project data, namely theprojects_list
and thesearch_collections_projects_repos
pipes.This PR:
segmentRepositories
table into Sequin.segmentRepositories
data source in TinyBird to receive data from the aforementioned table.insights_projects_populated_copy
pipe to include the list of archived and excluded repositories, which is then propagated to theinsights_projects_populated_ds
data source, from there it feeds theinsightsProjects_filtered
pipe, and finally theprojects_list
pipe, which is the end goal.search_collections_projects_repos
containsarchived
andexcluded
columns for the repositories results. This required adding aJOIN
to the query, and with all the optimisations that were done not too long ago, I'm not sure if thisSpecific tickets in Jira: